MG205: Econometrics Theory and Applications

Topic 8: Exploiting Time Variation

José Ignacio González Rojas

London School of Economics and Political Science

March 2, 2026

Part I: First Differences and Fixed Effects

Cross-Sectional Data Cannot Separate Heterogeneity from Treatment Effects

Panel Data Gives Us New Tools to Address Endogeneity

The problem

  • Endogeneity: \(\text{Cov}(x_{it}, e_{it}) \neq 0\)
  • Violation of Assumption 5 \(\Rightarrow\) no identification
  • OLS gives biased estimates of the parameters of interest
  • Cross-sectional data alone cannot fix this

Today

  • Assume a particular error structure: \(e_{it} = \alpha_i + u_{it}\)
  • With panel data, construct estimators invariant to \(\alpha_i\)

Following the same units over time enables new identification and estimation strategies.

Two Estimators Remove Unit-Level Unobserved Heterogeneity

First Differences and LSDV

First Differences (FD)

  • Population model: \(y_{it} = \beta x_{it} + \alpha_i + u_{it}\)
  • Subtract consecutive observations:

\[\Delta y_{it} = \beta \Delta x_{it} + \Delta u_{it}\]

  • \(\alpha_i - \alpha_i = 0\): unobserved heterogeneity disappears

Least Squares Dummy Variables (LSDV)

  • Include a dummy for each unit \(i\):

\[y_{it} = \beta x_{it} + \sum_{j=2}^{N} \gamma_j \mathbb{1}[i=j] + u_{it}\]

  • The dummies absorb \(\alpha_i\)
  • Equivalent to FD for \(T=2\) (we prove this later)

Exercise 1: Unobserved Heterogeneity Biases Cross-Sectional Estimates

Airline Fares Depend on Unobserved Route and Time Characteristics

Two Sources of Omitted Variable Bias

\[ \begin{aligned} \log(\text{fare})_{it} &= \beta_0 + \beta_1\log(\text{distance})_i + \beta_2\text{competition}_{it} + e_{it} \\ e_{it} &= \gamma_i + \delta_t + u_{it} \end{aligned} \]

  • \(\gamma_i\): route-specific, time-invariant unobserved heterogeneity
    • Business relationships
    • Airport amenities
  • \(\delta_t\): common time shocks
    • Fuel prices
    • Economic conditions
  • \(u_{it}\) is idiosyncratic error
  • We worry that \(\text{Cov}(\text{competition}_{it}, \gamma_i) \neq 0\)
    • Model not identified
    • \(\hat{\beta}\) might be biased

Derivation

Controlling for Distance and Year Dummies Does Not Remove Route Heterogeneity

The Proposed Model Falls Short

Estimated model

\[\begin{align*} \widehat{\log(\text{fare})}_{it} &= \hat{\beta}_{0} + \hat{\beta}_{1}\log(\text{distance})_{i} \\ &+ \hat{\beta}_{2}\text{competition}_{it} \\ &+ \hat{\delta}_{1}\mathbb{1}[t=2007] \\ &+ \hat{\delta}_{2}\mathbb{1}[t=2012] \end{align*}\]

What remains in the error?

  • Recall: \(e_{it} = \gamma_i + \delta_t + u_{it}\)
  • The year dummies address common time trends (\(\delta_t\))
  • \(\gamma_i\) remains in the error

Since \(\text{Cov}(\text{competition}_{it}, \gamma_i) \neq 0\), OLS is biased.

First-Differencing Eliminates Route-Level Unobserved Heterogeneity

The First-Difference Estimator

  • First-difference operator: \(\Delta x_{it} = x_{it} - x_{it-1}\)
  • Example: Take the USA–UK route. Subtract 2002 from 2007, and 2007 from 2012.

\[ \Delta\log(\text{fare})_{it} = \beta_2\Delta\text{competition}_{it} + \Delta\delta_t + \Delta u_{it} \]

\(\gamma_i - \gamma_i = 0\): time-invariant route characteristics disappear.

With year dummies for transition periods (2002–2007 base, 2007–2012):

\[ \widehat{\Delta\log(\text{fare})}_{it} = \hat\alpha + \hat\beta_2\Delta\text{competition}_{it} + \hat\delta\mathbb{1}[\text{transition } 2007-2012] \]

Derivation

Combining FD with Year Dummies Addresses Both Sources

Two Strategies for Two-Way Heterogeneity

FD removes \(\gamma_i\) (unit FE)

  • Subtract consecutive observations
  • \(\gamma_i - \gamma_i = 0\)
  • Time-invariant variables also drop out: \(\Delta\log(\text{distance})_i = 0\)

Year dummies absorb \(\delta_t\) (time FE)

  • Include dummies for transition periods
  • Common time shocks captured
  • This is LSDV applied to time effects

(1) FD + year dummies, or (2) full LSDV with dummies for both units and time periods.

Rejecting the Null Does Not Validate the Model

The Trap

  • With robust \(t\)-statistics, we reject \(H_{0}: \beta_{\text{competition}} = 0\)
  • But the null assumes the model is correctly specified
  • If OVB remains (route-level heterogeneity not addressed), \(\hat{\beta}\) is biased
  • Statistical significance \(\neq\) valid causal interpretation
  • The estimate is a linear projection
  • FD reduces bias from time-invariant confounders but does not eliminate all sources

Exercise 2: Time Effects Capture Industry-Wide Patent Growth

Industry-Wide Patent Growth Requires Flexible Time Effects

Setting

  • 37 pharmaceutical firms
  • 2005-2007
  • No OVB concerns
    • Causal interpretation
  • Patents growing industry-wide
    • Regardless of individual firm R&D

Model

\[\begin{align*} \log(\text{patents})_{it} &= \beta_0 + \beta_1\log(\text{R\&D})_{it} \\ &+ \beta_2\mathbb{1}[t=2006] + \beta_3\mathbb{1}[t=2007] \\ &+ e_{it} \end{align*}\]

  • Could model the trend linearly or quadratically
  • Year dummies allow any form — nonparametric
  • \(\beta_{1}\): elasticity of patents w.r.t. R&D (causal)

Year Dummy Coefficients Measure Growth Rates

Conditional Expectations Are The Tool to Interpret

\[\begin{align*} \mathbb{E}[\log(\text{patents})_{it} \mid t=2005] &= \beta_0 + \beta_1\log(\text{R\&D})_{it} \\ \mathbb{E}[\log(\text{patents})_{it} \mid t=2006] &= (\beta_0 + \beta_2) + \beta_1\log(\text{R\&D})_{it} \\ \mathbb{E}[\log(\text{patents})_{it} \mid t=2007] &= (\beta_0 + \beta_3) + \beta_1\log(\text{R\&D})_{it} \end{align*}\]

  • Average log change in patents across all firms
  • Geometric mean growth rate of patents in the industry, holding R&D constant
    • \(\beta_2\): 2005 to 2006
    • \(\beta_3\): 2005 to 2007

Year dummies measure growth rates between periods — not “the level in 2006 vs 2005.”

Interactions Allow the R&D Elasticity to Vary Over Time

Heterogeneous Elasticities

\[ \begin{align*} \log(\text{patents})_{it} &= \beta_0 + \beta_1\log(\text{R\&D})_{it} + \beta_2\mathbb{1}[t=2006] + \beta_3\mathbb{1}[t=2007] \\ &+ \beta_4(\log(\text{R\&D})_{it} \times \mathbb{1}[t=2006]) + \beta_5(\log(\text{R\&D})_{it} \times \mathbb{1}[t=2007]) \\ &+ e_{it} \end{align*} \]

Conditional means by year

\[\begin{align*} \mathbb{E}[\log(\text{patents})_{it} \mid t=2005] &= \beta_0 + \beta_1\log(\text{R\&D})_{it} \\ \mathbb{E}[\log(\text{patents})_{it} \mid t=2006] &= (\beta_0 + \beta_2) + (\beta_1 + \beta_4)\log(\text{R\&D})_{it} \\ \mathbb{E}[\log(\text{patents})_{it} \mid t=2007] &= (\beta_0 + \beta_3) + (\beta_1 + \beta_5)\log(\text{R\&D})_{it} \end{align*}\]

Level Shifts and Slope Shifts Are Separately Identified

Decomposing Differential Effects

Year Intercept Elasticity
2005 \(\beta_0\) \(\beta_1\)
2006 \(\beta_0 + \beta_2\) \(\beta_1 + \beta_4\)
2007 \(\beta_0 + \beta_3\) \(\beta_1 + \beta_5\)

How the patents-R&D elasticity changes over time

  • \(\beta_4\): elasticity change 2005 \(\to\) 2006
  • \(\beta_5\): elasticity change 2005 \(\to\) 2007

No need to interpret \(\beta_4\) and \(\beta_5\) individually; the conditional means do the work.

Exercise 3: Empirical Models Interact All Relevant Variables

Gender Wage Gaps Changed After the Mining Boom

Three Patterns from the Data

  • Gender: Men earn a constant wage premium over women
  • Time trend: Wages grow over time for both groups
  • Structural break (2005): After the mining company arrives, the male premium widens

The Empirical Model Interacts Gender, Time, and Post-2005

Eight Coefficients for Four Groups

\[ \begin{align*} \log(\text{wages})_{it} &= \beta_0 + \beta_1\mathbb{1}[i\text{ is male}] + \beta_2 t + \beta_3\mathbb{1}[t \geq 2005] \\ &+ \beta_4(\mathbb{1}[i\text{ is male}] \times t) + \beta_5(\mathbb{1}[i\text{ is male}] \times \mathbb{1}[t \geq 2005]) \\ &+ \beta_6(t \times \mathbb{1}[t \geq 2005]) + \beta_7(t \times \mathbb{1}[t \geq 2005] \times \mathbb{1}[i\text{ is male}]) \\ &+ e_{it} \end{align*} \]

This model captures level differences, trends, and how both changed after 2005, separately for men and women.

Conditional Means: Pre-2005

Women and Men Before the Structural Break

Women before 2005 (base category)

\[ \mathbb{E}[\log(\text{wages})_{it} \mid \mathbb{1}[i \text{ is male}]=0,t<2005] = \beta_0 + \beta_2 t \]

Men before 2005

\[ \mathbb{E}[\log(\text{wages})_{it} \mid \mathbb{1}[i \text{ is male}]=1,t<2005] = (\beta_0 + \beta_1) + (\beta_2 + \beta_4)t \]

\(\beta_1\) shifts the intercept; \(\beta_4\) shifts the slope.

Conditional Means: Post-2005

Women and Men After the Structural Break

Women after 2005

\[ \mathbb{E}[\log(\text{wages})_{it} \mid \mathbb{1}[i \text{ is male}]=0,\; t \geq 2005] = (\beta_0 + \beta_3) + (\beta_2 + \beta_6)t\]

Men after 2005

\[\begin{align*} \mathbb{E}[\log(\text{wages})_{it} \mid \mathbb{1}[i \text{ is male}]=1,\; t \geq 2005] &= (\beta_0 + \beta_1 + \beta_3 + \beta_5) \\ &\quad + (\beta_2 + \beta_4 + \beta_6 + \beta_7)t \end{align*}\]

Each coefficient modifies either the intercept or slope for a specific group-period combination.

Taking Differences Isolates Each Coefficient’s Role

Condition on Group, Then Difference

Coefficient Signs Follow Directly from the Figure

Economic Interpretation

Positive (\(> 0\))

  • \(\beta_1\): male premium
  • \(\beta_2\): wages grow over time
  • \(\beta_7\): male wages grow faster post-2005

Zero (\(= 0\))

  • \(\beta_3\): no level break for women at 2005
  • \(\beta_4\): same pre-2005 growth rate
  • \(\beta_6\): female growth unchanged post-2005

Negative (\(< 0\))

  • \(\beta_5\): relative to the base group (women pre-2005), the intercept for men post-2005 is lower than what other coefficients predict

Exercise 4: Panel Data Enables Identification and Estimation

Panel Data Follows the Same Units Over Time

Definition and Structure

  • \(y_{it}\), \(x_{it}\) for \(i = 1, \ldots, N\) and \(t = 1, \ldots, T\)
  • Panel data: same units observed across multiple time periods
  • Cross-section: one \(t\) only
  • Repeated cross-section: same population, different individuals each period

Error decomposition

\[ e_{it} = \alpha_i + v_{it} \]

Panel Structure Enables Identification of Parameters

Identification vs Estimation

Identification

  • Could we recover unique values for each parameter?
    • Cross-section: \(\alpha_i\) in error, correlated with \(x_{it}\) \(\to\) cannot identify \(\beta\)
    • Panel: difference out \(\alpha_i\)
  • Requires: \(\mathbb{E}[v_{it} \mid x_{it}] = 0\)

Estimation

  • Given identification, how do we compute \(\hat\beta\) from the data?

\[ \Delta y_i = \delta + \beta_1\Delta x_{i1} + \cdots + \beta_k\Delta x_{ik} + \Delta v_i \]

  • Requires \(T \geq 2\) and within-unit variation

Example: cannot estimate returns to education via FD if education does not change over time.

Panel Data Reduces OVB but Cannot Eliminate All Sources of Bias

Solves

  • Time-invariant OVB (\(\alpha_i\))
  • Unit-invariant OVB (\(\lambda_t\))

Does not solve

  • Time-varying confounders (\(u_{it}\))
  • Measurement error
  • Selection bias

Less scope for OVB, but not zero.

Exercise 5: First-Differencing Amplifies Measurement Error

Education Is Measured with Error in Both Periods

True vs Observed Variables

\[ \text{education}_{it} = \text{education}^{*}_{it} + e_{it} \]

Assumptions

  • \(e_{it}\) uncorrelated with true education and other variables
  • Education varies little over time for adults

Cross-Sectional Attenuation Bias Shrinks the Coefficient Towards Zero

The Baseline Problem

Population model

\[ \log(\text{wage})_i = \alpha + \beta\text{education}^{*}_i + \epsilon_i \]

Observed model substitutes \(\text{education}_i = \text{education}^{*}_i + e_i\)

Attenuation bias

\[ \text{plim}\;\hat\beta = \beta \cdot \frac{\text{Var}(\text{educ}^*)}{\text{Var}(\text{educ}^*) + \text{Var}(e)} \]

The ratio is less than 1, so the coefficient is biased towards zero (derived in Topic 6).

First-Differencing Increases the Measurement Error Variance

The Panel Data Paradox

FD of observed education

\[ \Delta\text{education}_i = \Delta\text{education}^{*}_{i} + (e_{i2} - e_{i1}) \]

If \(e_{i1}\) and \(e_{i2}\) are uncorrelated

\[ \text{Var}(e_{i2} - e_{i1}) = \text{Var}(e_{i1}) + \text{Var}(e_{i2}) \]

Panel Data Involves a Fundamental Bias Trade-off

FD Eliminates Fixed Confounders but Amplifies Measurement Error

\[ \text{plim}\;\hat\beta_{\text{FD}} = \beta \cdot \frac{\text{Var}(\Delta\text{educ}^*)}{\text{Var}(\Delta\text{educ}^*) + \text{Var}(e_{i1}) + \text{Var}(e_{i2})} \]

Benefits

  • Eliminates \(\alpha_i\), reduces OVB from time-invariant confounders
  • Enables causal identification under strict exogeneity

Derivation

Costs

  • Measurement error variance grows in denominator
  • Numerator small if \(x\) changes little over time
  • Attenuation ratio is smaller — more severe bias towards zero

Exercise 6: Panel Data Cannot Solve Selection Bias

Roommate Nationality May Affect Student Grades

Self-Selection into Rooms Creates Endogeneity

Let \(\text{same}_{it} = \mathbb{1}[i\text{ has same-nationality roommate in } t]\)

\[ \text{grades}_{it} = \alpha + \beta\;\text{same}_{it} + e_{it} \]

  • Exogeneity holds under random assignment of roommates
  • If students choose roommates: an omitted equation determines room selection
  • More outgoing students may prefer different nationalities and perform differently academically
  • \(\text{Cov}(\text{same}_{it}, e_{it}) \neq 0\) arises from selection, not unobserved heterogeneity

Two Years of Data Require Roommate Changes

Panel Structure and Exogenous Mobility

Year 1

\(\text{grades}_{i1} = \alpha + \beta\;\text{same}_{i1} + \alpha_i + u_{i1}\)

Year 2

\(\text{grades}_{i2} = \alpha + \beta\;\text{same}_{i2} + \alpha_i + u_{i2}\)

First-differencing

\(\Delta\text{grades}_i = \beta\;\Delta\text{same}_i + \Delta u_i\)

  • Critical: students must change roommates (\(\Delta\text{same}_i \neq 0\) for some)
  • Exogenous mobility design — plausible if the university reassigns rooms

The Problem Is Selection, Not Unobserved Heterogeneity

What Panel Data Cannot Fix

  • FD removes \(\alpha_i\) (unobserved ability) — but the core problem is selection into rooms
  • If reasons for changing roommates correlate with grade changes, FD does not help
  • Panel data addresses unobserved heterogeneity
  • It does not address selection bias

Random Assignment Plus Panel Data Strengthens Identification

You Get What You Pay For

  • Random assignment of roommates:
    • \(\beta\) is identified even in cross-section
    • Panel adds precision: removes \(\alpha_i\) from the error \(\to\) smaller variance \(\to\) smaller standard errors
  • Potential selection: panel data alone cannot solve the selection problem

Part II: Inference, Functional Forms, and Composition

Inference, Functional Forms, and Composition in Panel Data

We Run into an Old Friend!

Inference

  • AS2 violated: same unit observed repeatedly
  • AS7 may fail: heterogeneous units
  • Affects SEs, not \(\hat{\beta}\)

Functional Forms

  • Linear vs nonparametric time trends
  • Treatment absorbed by time FE
  • Unbalanced panels → composition effects

Exercise 7: Clustered Standard Errors Account for Within-Unit Correlation

Clustering Allows Arbitrary Within-Unit Correlation

A Conservative Fix for Panel Inference

What clustering does

  • Allows arbitrary correlation within cluster
  • Requires independence across clusters
  • Does not change \(\hat{\beta}\) — only the standard errors

Two sources of correlation

  • Within-unit persistence: worker earnings persist year to year
  • Within-group spillovers: departmental training affects all workers

Serial Correlation Only Affects Inference

The Three Pillars: Identification, Estimation, Inference

  • Identification (AS1, AS3-AS5): Can we recover \(\beta\)? Not affected
  • Estimation (AS2): How do we compute \(\hat{\beta}\)? Not affected
  • Inference (AS2, AS7): Are our SEs, p-values, CIs valid?
    • Assumption 2 (i.i.d. sampling): same unit observed repeatedly → errors correlated within unit
    • Assumption 7 (homoskedasticity): error variance may differ across units
    • Both violations affect inference only\(\hat{\beta}\) unchanged, SEs wrong

Exercise 8: Time Fixed Effects Can Be Collinear with Treatment

Measuring the Effect of Increased Force

All Municipalities Treated Simultaneously

Empirical setup

  • Dependent variable: Drug usage at municipality \(i\) on day \(t\)
  • Treatment: All municipalities increase police on the same date (vertical line in figure)
  • Drug usage has a pre-existing upward trend

Day Fixed Effects Cannot Separate Treatment from Time

The Collinearity Trap

Ideal model: \(\text{drug usage}_{it} = \mu\;\text{post}_t + \theta_i + \rho_t + e_{it}\)

Conditional expectations — with \(T = 4\), treatment at \(t = 3\):

\[\begin{align*} \mathbb{E}[\text{drug usage}_{it} \mid t=1] &= \theta_i + \rho_1 \\ \mathbb{E}[\text{drug usage}_{it} \mid t=2] &= \theta_i + \rho_2 \\ \mathbb{E}[\text{drug usage}_{it} \mid t=3] &= \mu + \theta_i + \rho_3 \\ \mathbb{E}[\text{drug usage}_{it} \mid t=4] &= \mu + \theta_i + \rho_4 \end{align*}\]

  • \((\mu + \rho_3)\) observed jointly, not separately \(\Rightarrow\) \(\mu\) not identified
  • Reason: \(\text{post}_t\) is a linear combination of day dummies

A Linear Time Trend Restores Identification

Parametric but Estimable

\[\text{drug usage}_{it} = \mu\text{post}_t + \theta_i + \gamma t + e_{it}\]

Interpretation of \(\gamma\)

  • Average change per time unit
  • Imposes linearity — may miss curvature
  • Intermediate: polynomial

The trade-off

Parametric trend Day FE
Flexibility Low (linear) High (any shape)
Estimate \(\mu\)? Yes No (collinear)
Risk Misspecified trend No identification

When treatment varies only at the time level, time FE absorb it completely.

Proof

Exercise 9: Age Dummies Provide Nonparametric Functional Forms

When We Do Not Know the Functional Form, Use Dummies

Nonparametric Estimation

\[ \log(\text{earnings})_{it} = \alpha_i + \theta_t + \sum_{j=17}^{85} \gamma_j\;\mathbb{1}[\text{age}_{it} = j] + e_{it} \]

  • \(\alpha_i\): individual FE (absorbs ability, education, etc.)
  • \(\theta_t\): time FE (absorbs aggregate macroeconomic trends)
  • \(\gamma_j\): average difference in log-earnings between workers aged \(j\) and workers aged 16, holding constant \(\alpha_i\) and \(\theta_t\)
    • Nonparametric: no assumption on the shape of the age-earnings relation
    • Parametric: quadratic requires \(f(\text{age}) = \beta_1\;\text{age} + \beta_2\;\text{age}^2\)

Why log? APC problem

Imprecision at Extremes Reflects Thin Data

The Variance Formula for Dummy Variables

Each age dummy \(d_j = \mathbb{1}[\text{age}_{it} = j]\) is a binary variable with proportion \(p_j = n_j/n\):

\[ \text{Var}(\hat{\gamma}_j) \propto \frac{\sigma^2}{n \cdot p_j(1 - p_j)} \]

  • Most workers aged 25-64 \(\Rightarrow\) \(p_j\) near its maximum \(\Rightarrow\) \(\text{Var}(\hat{\gamma}_j)\) small
  • Few workers at ages 16-24 and 65-85 \(\Rightarrow\) \(p_j \approx 0\) \(\Rightarrow\) \(\text{Var}(\hat{\gamma}_j)\) large
  • \(\hat{\gamma}_j\) at extreme ages has wide confidence intervals

Nonparametric flexibility comes at the cost of imprecision where data is thin.

Exercise 10: Fixed Effects Decompose Treatment Into Incentive and Selection

Performance Pay Increases Firm-Level Productivity by 20%

But How Much Is Incentives vs Selection?

A firm introduces performance pay. The panel is unbalanced: some workers leave (exiters), some stay (stayers), some join (entrants).

\[ \log(\widehat{\text{productivity}})_{it} = \hat{\alpha}_i + \hat{\beta}\;\text{performance pay}_t \]

OLS (Pooled)

  • \(\hat{\beta}_{\text{OLS}} = 0.20\) (SE \(= 0.03\))
  • Captures total effect at firm level
    • Uses all workers (stayers + exiters + entrants)

Fixed Effects

  • \(\hat{\beta}_{\text{FE}} = 0.10\) (SE \(= 0.02\))
  • Captures within-worker incentive effect only
    • Only stayers contribute to identification

Half the Effect Is Incentives, Half Is Composition

The Decomposition

Incentive effect

  • \(\hat{\beta}_{\text{FE}} = 0.10\)
  • Same workers produce more under performance pay
    • They work harder

Composition effect

  • \(\hat{\beta}_{\text{OLS}} - \hat{\beta}_{\text{FE}} = 0.20 - 0.10 = 0.10\)
  • Different workers join the firm under performance pay
    • \(\mathbb{E}[\alpha_i \mid \text{entrant}] > \mathbb{E}[\alpha_i \mid \text{exiter}]\)

OLS captures total change; FE isolates the within-unit mechanism. The difference is the selection channel.

Formal decomposition

Exercise 11: Worker Composition Changes Require Fixed Effects

This Exercise Combines Three Previous Challenges

Collinearity (Q8) + Composition (Q10) + Time-Varying Controls (New)

\[ \text{productivity}_{it} = \beta_1\;\text{contingent}_t + \beta_2\;\text{weather}_t + \beta_3\;\text{width}_{it} + \beta_4\;\text{height}_{it} + e_{it} \]

Q8 callback: Collinearity with time FE

  • Contingent pay switches on one date for everyone
  • \(\text{contingent}_t\) collinear with day dummies
  • Cannot include time FE → use weather as observable time-varying control

Q10 callback: Negative composition

  • Best workers leave for blueberry harvest (neighbouring farms offer better alternatives)
  • Remaining workers less productive: \[\mathbb{E}[\gamma_i \mid \text{second half}] < \mathbb{E}[\gamma_i \mid \text{first half}]\]
  • Direction flips from Q10 (where better workers joined)

Width and Height Are Time-AND-Individual Varying

Controls vs Fixed Effects Address Different Problems

\[\begin{align*} \text{productivity}_{it} &= \beta_1\;\text{contingent}_t + \beta_2\;\text{weather}_t + \beta_3\;\text{width}_{it} + \beta_4\;\text{height}_{it} \\ &+ \gamma_i + e_{it} \end{align*}\]

  • Width and height vary across workers AND days — not captured by worker FE (\(\gamma_i\) absorbs time-invariant traits only)
  • Still susceptible to Assumption 5: if correlated with \(e_{it}\), OVB persists
  • Only workers in both periods contribute — churn shrinks effective sample

Controls handle time-varying confounders; FE handles time-invariant heterogeneity.

Exercise 12: Multiple Fixed Effects Identify Leader Quality Through Rotation

Call Centre Productivity Varies Across Operators, Leaders, and Days

Three Sources of Variation

Setting

  • Outcome: calls answered per hour
  • Panel of operators \(\times\) days
  • Random call assignment: no operator-call selection
  • Operators and leaders rotate across teams
    • Rotation provides identifying variation

Three sources of heterogeneity

  • \(\lambda_i\): operator ability
  • \(\mu_j\): leader quality (the parameter of interest)
  • \(\theta_t\): day-level conditions (holidays, system outages)

Three-Way Fixed Effects Isolate Leader Quality

Rotation Is the Identification Condition

\[ \text{productivity}_{ijt} = \lambda_i + \mu_j + \theta_t + e_{ijt} \]

  • \(\lambda_i\): operator FE — absorbs intrinsic worker ability
  • \(\mu_j\): leader FE — the object of interest
  • \(\theta_t\): day FE — absorbs common daily shocks
  • Identification requires rotation: same operator observed under different leaders
    • Without rotation, \(\lambda_i\) and \(\mu_j\) are not separately identified

The F-Test Provides Evidence for Leader Quality Differences

Joint Significance of Leader Fixed Effects

Restricted vs unrestricted

  • Restricted (\(H_0\) true): \(\text{productivity}_{ijt} = \lambda_i + \mu + \theta_t + e_{ijt}\)
  • Unrestricted (\(H_1\)): \(\text{productivity}_{ijt} = \lambda_i + \mu_j + \theta_t + e_{ijt}\)

Five-step framework

  1. Choose \(\alpha = 0.05\)
  2. \(H_0: \mu_2 = \mu_3 = \cdots = \mu_J = 0\)
  3. \(F = \frac{(\text{RSS}_R - \text{RSS}_U)/q}{\text{RSS}_U/(n - K)}\;,\quad q = J - 1\)
  4. Reject \(H_0\) if \(F > F_{q,\; n-K,\; \alpha}\)
  5. If reject: leaders differ in quality

Multiple fixed effects require sufficient rotation across dimensions for identification.

AKM (1999)

Part III: Differences-in-Differences

Panel Methods Need a Control Group to Handle Universal Treatment

From Within-Unit Variation to Between-Group Comparisons

Weeks 1-2

  • FD and FE eliminate time-invariant OVB (\(\alpha_{i}\))
  • Collinearity trap: universal treatment absorbed by time FE (Q8)
  • Composition effects in unbalanced panels (Q10)

Today

  • What if we have a treatment AND a control group?
  • The Differences-in-Differences estimator

The Rubin Causal Model

Every Unit Has Two Potential Outcomes

The Model

For each individual \(i\):

  • \(y_{i}(1)\): outcome if treated
  • \(y_{i}(0)\): outcome if not treated
  • Individual treatment effect: \(\tau_{i} = y_{i}(1) - y_{i}(0)\)

We only observe one potential outcome per unit — the other is the counterfactual

Example

\(y_{i}(1)\) \(y_{i}(0)\) \(\tau_{i}\)
María £45,000 ? ?
Pedro ? £30,000 ?

We never observe both columns for the same person

From Individual Effects to Population Averages

The Average Treatment Effect

  • We cannot compute \(\tau_{i}\) for any single person.
  • But if all units come from the same DGP:
    • \(\text{ATE} = \mathbb{E}[y_{i}(1)] - \mathbb{E}[y_{i}(0)]\)
  • Let \(T_{i} = 1\) if treated, \(0\) otherwise.
  • What we actually observe:
    • \(\mathbb{E}[y_{i} \mid T_{i} = 1] - \mathbb{E}[y_{i} \mid T_{i} = 0]\)

When are these the same?

Only if who gets treated is unrelated to the potential outcomes.

The Observed Difference Decomposes into ATT and Selection Bias

The Add-and-Subtract Trick

Add and subtract \(\mathbb{E}[y(0) \mid T\!=\!1]\) inside the observed difference:

\[\begin{align*} \mathbb{E}[y \mid T\!=\!1] - \mathbb{E}[y \mid T\!=\!0] &= \mathbb{E}[y(1) \mid T\!=\!1] - \textcolor{orange}{\mathbb{E}[y(0) \mid T\!=\!1]} \\ &+ \textcolor{orange}{\mathbb{E}[y(0) \mid T\!=\!1]} - \mathbb{E}[y(0) \mid T\!=\!0] \\ &= \underbrace{\mathbb{E}[y(1) - y(0) \mid T\!=\!1]}_{\text{ATT}} \\ &+ \underbrace{\mathbb{E}[y(0) \mid T\!=\!1] - \mathbb{E}[y(0) \mid T\!=\!0]}_{\text{selection bias}} \end{align*}\]

Independence Makes Selection Bias Vanish

Research design assumption (randomisation)

\[T_{i} \perp\!\!\!\perp (y_{i}(0),\; y_{i}(1))\]

Selection bias term:

\[\begin{align*} \underbrace{\mathbb{E}[y(0) \mid T\!=\!1]}_{\substack{\text{= } \mathbb{E}[y(0)] \\ \text{by independence}}} - \underbrace{\mathbb{E}[y(0) \mid T\!=\!0]}_{\substack{\text{= } \mathbb{E}[y(0)] \\ \text{by independence}}} &= \mathbb{E}[y(0)] - \mathbb{E}[y(0)] = 0 \end{align*}\]

The observed difference equals the ATT.

DiD Recovers the Average Effect, Never the Individual

The Counterfactual Remains Unobserved

\[\text{ATT} = \mathbb{E}[y_{i}(1) - y_{i}(0) \mid T_{i} = 1]\]

  • The counterfactual is still unobserved for every individual
  • We use the control group’s trend as a proxy for the treated group’s missing trajectory
  • Some individuals gain more, some less — we only recover the average

Exercise 13: Tax Simplification and Firm Profitability

Some US States Simplified Their Tax Codes Between 1985 and 2005

Setting and Research Question

The Natural Experiment

  • Some states reformed their tax codes to reduce complexity
  • Other states kept the old system
  • Reform timing was driven by politics — not by firm profitability
    • Treatment assignment “as good as random”

Data

  • Unit: firm \(i\) in state \(s\) at time \(t\)
  • Outcome: firm profit
  • Treatment: \(\mathbb{1}[\text{simple}_{i}]\) = firm in a reformed state
  • Periods: 1985 (before) and 2005 (after)
  • Repeated cross-sections — different firms sampled each year

DiD Works Without Panel Data

Repeated Cross-Sections Suffice

\[\text{profit}_{it} = \alpha + \beta\;\mathbb{1}[\text{simple}_{i}] + \lambda\;\mathbb{1}[t = 2005] + \theta\;(\mathbb{1}[\text{simple}_{i}] \times \mathbb{1}[t = 2005]) + e_{it}\]

  • \(\theta\) = the DiD estimator
    • Additional change in profit for simplified states beyond the common trend
  • No panel data needed — DiD only requires group and time variation

The Identification Assumption

  • Without tax simplification, both groups would have followed the same profit trajectory
  • Violated if reforming states were already on a different growth path
  • “As good as random” assignment strengthens this assumption

Exercise 14: Police Force and Discrimination

Police Encounters Vary by Officer-Citizen Race Combinations

Setting and Research Question

The Setting

  • Data on police-civilian encounters in a US city
  • Each encounter has a citizen (\(i\)) and an officer (\(j\))
  • Outcome: intensity of force used during the encounter
  • Both citizen and officer race are observed

The Question

  • Do officers use more force against citizens of a different race?
  • This is about in-group bias, not discrimination per se
  • No time dimension — purely cross-sectional
  • Interaction captures the race-match effect

The Interaction Model Separates Race Effects from Match Effects

Cross-Sectional Interaction Model

\[\begin{align*} \text{force}_{ij} &= \beta_{0} + \beta_{1}\mathbb{1}[i\text{ is Black}] + \beta_{2}\mathbb{1}[j\text{ is Black}] \\ &+ \beta_{3}(\mathbb{1}[i\text{ is Black}] \times \mathbb{1}[j\text{ is Black}]) + e_{ij} \end{align*}\]

where \(i\) = citizen, \(j\) = officer

Coefficient Table

White Officer Black Officer
White Citizen \(\beta_{0}\) \(\beta_{0} + \beta_{2}\)
Black Citizen \(\beta_{0} + \beta_{1}\) \(\beta_{0} + \beta_{1} + \beta_{2} + \beta_{3}\)
  • \(\beta_{3}\) tests in-group/out-group bias
  • NOT a DiD model — no time dimension

Observational Strategies for Discrimination Have Limitations

Contrast with EE6

  • Cannot separate prejudice from different risk assessments
  • Assumes context variables balanced across officer-citizen race pairs
  • EE6’s experimental design avoids these — randomisation breaks OVB

Exercise 15: Cannabis Prohibition and Student Performance

A Dutch City Banned Cannabis Sales to Foreign Students

Setting and Research Question

The Natural Experiment

  • Maastricht prohibited cannabis sales to non-Dutch students (periods 12-16)
  • Treatment: non-DGB students (from Germany, Belgium — lost access)
  • Control: DGB students (Dutch — kept access)
  • Both groups attend the same university courses

What Makes This a DiD?

  • Treatment affects one group but not the other
  • Both groups observed before and during the policy
  • Parallel trends testable with pre-treatment data
  • Individual FE absorb baseline differences

Individual Fixed Effects Absorb Level Differences

DiD with Panel Data

\[\text{grades}_{it} = \alpha_{i} + \lambda_{t} + \beta\;(\mathbb{1}[\text{non-DGB}_{i}] \times \mathbb{1}[\text{prohibited}_{t}]) + e_{it}\]

  • \(\alpha_{i}\) absorbs the level difference between DGB and non-DGB students
  • \(\beta\) = DiD estimator: the effect of cannabis prohibition on grades
  • Recall Q8: universal treatment was collinear with time FE
    • Here, the control group (DGB students) breaks the collinearity

Exercise 16: School Meals and Academic Performance

A School Meal Programme Was Assigned by Enrolment Number

Setting and Research Question

The Setting

  • Free school meals introduced in 2013
  • Schools with even enrolment numbers received the programme
  • Schools with odd enrolment numbers did not
  • Enrolment numbers are administratively assigned — unrelated to school quality

Why This Matters

  • Random assignment means independence holds
  • With data from both 2012 and 2013, we can use DiD
  • With data from only 2013, simple differences suffice
  • Same logic as EE6 — randomisation breaks OVB

Random Assignment Makes Two Strategies Valid

Three Data Scenarios

Scenario 1 — Both years (DiD)

\[\begin{align*} \text{performance}_{it} &= \beta_{0} + \delta_{0}\mathbb{1}[t = 2013] + \beta_{1}\mathbb{1}[\text{even}_{i}] + \delta_{1}(\mathbb{1}[t = 2013] \times \mathbb{1}[\text{even}_{i}]) \\ &+ e_{it} \end{align*}\]

Scenario 2 — Only 2013 (simple differences)

\[\text{performance}_{i} = \eta_{0} + \eta_{1}\;\mathbb{1}[\text{even}_{i}] + e_{i}\]

  • Valid because enrolment numbers are random — same logic as EE6.

Scenario 3 — Only 2012

  • Impossible (programme did not exist)

Exercise 17: The Differences-in-Differences Estimator

The 2×2 Table Defines the DiD Estimator

Conditional Means by Group and Period

\[y_{it} = \beta_{0} + \delta_{0}\;\mathbb{1}[t = 2] + \beta_{1}\;\mathbb{1}[\text{Treatment}_{i}] + \delta_{1}\;(\mathbb{1}[t = 2] \times \mathbb{1}[\text{Treatment}_{i}]) + e_{it}\]

Before (\(t = 1\)) After (\(t = 2\)) Difference
Control \(\beta_{0}\) \(\beta_{0} + \delta_{0}\) \(\delta_{0}\)
Treatment \(\beta_{0} + \beta_{1}\) \(\beta_{0} + \beta_{1} + \delta_{0} + \delta_{1}\) \(\delta_{0} + \delta_{1}\)
Difference \(\beta_{1}\) \(\beta_{1} + \delta_{1}\) \(\delta_{1}\)

Two Paths to the Same Estimator

Algebraic Equivalence

Difference in changes

\[\delta_{1} = \underbrace{(\bar{y}_{T,\text{after}} - \bar{y}_{T,\text{before}})}_{\text{change for treated}} - \underbrace{(\bar{y}_{C,\text{after}} - \bar{y}_{C,\text{before}})}_{\text{change for control}}\]

Change in differences

\[\delta_{1} = \underbrace{(\bar{y}_{T,\text{after}} - \bar{y}_{C,\text{after}})}_{\text{gap after treatment}} - \underbrace{(\bar{y}_{T,\text{before}} - \bar{y}_{C,\text{before}})}_{\text{gap before treatment}}\]

The DiD Estimator Visually

  • Control (blue solid), Treated (gold solid), Counterfactual (gold dashed)
  • Treatment effect = gap between actual treated and counterfactual at \(t = 2\)

Exercise 18: Police Retirement and Sickness Leave

A Retirement Reform Forced Some Officers to Work Longer

Setting and Research Question

The Reform

  • Government raised the retirement age for police officers
  • Officers close to retirement had to serve extra years
  • The number of extra years varies by birth cohort
    • Some officers: 1 extra year; others: 3+ extra years

The Research Question

  • Do officers forced to work longer take more sick leave?
  • Treatment is continuous: \(\Delta\text{years to work}_{i}\)
  • Not binary (treated/control) — intensity varies per officer
  • Officer FE absorb baseline sickness propensity

Treatment Intensity Can Be Continuous

Heterogeneous Treatment in DiD

\[\text{sickness}_{it} = \beta\;(\Delta\text{years to work}_{i} \times \mathbb{1}[\text{post}]_{t}) + \text{controls}_{it} + \alpha_{i} + \lambda_{t} + e_{it}\]

  • \(\Delta\text{years to work}_{i}\): extra years officer \(i\) must serve (continuous, not binary)
  • \(\beta\): extra sickness days per additional year of forced service

Summary

Panel Data Provides Identification Through Within-Unit Variation

Summary (I)

  1. FD and LSDV address different components of unobserved heterogeneity
  2. Time effects capture common growth; interactions capture heterogeneous effects
  3. Measurement error is amplified by first-differencing — a fundamental trade-off
  4. Panel data cannot solve selection bias
  5. Clustered standard errors correct for within-unit serial correlation

Alternative Functional Forms Are Compatible with Panel Data

Summary (II)

  1. Time fixed effects can be collinear with treatment
  2. Dummies provide nonparametric functional forms at the cost of imprecision at extremes
  3. Fixed effects decompose total effects into within-unit incentive and composition channels
  4. Multiple fixed effects require rotation for identification; F-tests assess joint significance

DiD Exploits Variation Across Groups and Time

Summary (III)

  1. DiD works without panel data — repeated cross-sections suffice if treatment and control groups exist
  2. Interactions test in-group bias without a time dimension — not every 2×2 is a DiD
  3. Pre-treatment trends provide evidence for parallel trends but cannot prove them
  4. Random assignment makes both DiD and simple differences valid
  5. The 2×2 table decomposes the DiD estimator; two algebraic paths yield the same \(\delta_{1}\)
  6. Treatment intensity can be continuous — DiD generalises beyond binary treatment

Next Week: Instrumental Variables

Topic 9

  • When panel data and DiD cannot solve endogeneity
  • The relevance and exogeneity conditions
  • Two-Stage Least Squares
  • Natural experiments as instruments

Appendix: Derivations and Extensions

First-Difference Derivation for General Panel Model

Algebra

Write the model for \(t = 1\) and \(t = 2\):

\[\begin{align*} y_{i1} &= \beta_0 + \beta_1 x_{i1,1} + \cdots + \beta_k x_{i1,k} + a_i + v_{i1} \\ y_{i2} &= (\beta_0 + \delta) + \beta_1 x_{i2,1} + \cdots + \beta_k x_{i2,k} + a_i + v_{i2} \end{align*}\]

Subtract: \(a_i - a_i = 0\).

\[ \Delta y_i = \delta + \beta_1\Delta x_{i1} + \beta_2\Delta x_{i2} + \cdots + \beta_k\Delta x_{ik} + \Delta v_i \]

First-Differencing Removes Time-Invariant Unobservables

  • \(a_i\) eliminated by differencing: \(a_i - a_i = 0\)
  • OLS on \(\Delta y_i\) is consistent under strict exogeneity: \(\mathbb{E}[\Delta v_i \mid \Delta x] = 0\)
  • Time-invariant variables (e.g., distance, gender) also drop out
  • Generalises to \(T > 2\): take consecutive differences for each pair \((t, t-1)\)

Return to Q1

Attenuation Factor: Cross-Section vs First-Differences

Cross-section

\[ \text{plim}\hat\beta_{\text{CS}} = \beta \cdot \frac{\text{Var}(\text{educ}^*)}{\text{Var}(\text{educ}^*) + \text{Var}(e)} \]

First-differences (assuming uncorrelated measurement errors)

\[ \text{plim}\;\hat\beta_{\text{FD}} = \beta \cdot \frac{\text{Var}(\Delta\text{educ}^*)}{\text{Var}(\Delta\text{educ}^*) + \text{Var}(e_{i1}) + \text{Var}(e_{i2})} \]

Why FD amplifies the bias

  • Numerator shrinks: education barely changes → \(\text{Var}(\Delta\text{educ}^*) \ll \text{Var}(\text{educ}^*)\)
  • Denominator doubles: \(\text{Var}(e_{i1}) + \text{Var}(e_{i2}) = 2\text{Var}(e)\)

Return to Q5

Why Log?

Three Reasons for Using Log-Earnings

  1. Skewness: raw earnings are right-skewed
    • \(\log(\text{earnings})\) yields a more symmetric distribution, closer to normality (supporting AS7)
  2. Multiplicative relationships: if earnings = base \(\times\) skill premium \(\times\) experience premium, then
    • \(\log(\text{earnings}) = \log(\text{base}) + \log(\text{skill premium}) + \log(\text{experience premium})\)
    • Log linearises multiplicative structures into additive ones
  3. Growth rates: \(\Delta\log(\text{earnings}) \approx \%\Delta\text{earnings}\)
    • Differences in logs approximate percentage changes
    • The natural unit for comparing workers across different baseline earnings

Return to Q9

Age-Period-Cohort Identification Problem

Why Age, Time, and Individual FE Cannot All Be Included

The collinearity

\[ \log(\text{earnings})_{it} = \alpha_i + \theta_t + \sum_{j=17}^{85} \gamma_j\;\mathbb{1}[\text{age}_{it} = j] + e_{it} \]

suffers from a fundamental identification problem:

\[ \text{age}_{it} = \text{year}_t - \text{birth year}_i \]

  • \(\text{birth year}_i\) is a linear function of \(\alpha_i\) (time-invariant)
  • \(\text{year}_t\) is captured by \(\theta_t\)
  • Therefore age is perfectly collinear with \(\alpha_i\) and \(\theta_t\)

Why this matters

  • Cannot separately identify age, period, cohort without restrictions
  • Normalise: set one age effect to zero
  • See Deaton (2018) and McKenzie (2006)

Return to Q9

OLS vs FE Decomposition — Formal

Stayers (\(S\)), Exiters (\(X\)), Entrants (\(N\))

Algebra

\[\begin{align*} \hat{\beta}_{\text{OLS}} &= \bar{y}_{\text{post}} - \bar{y}_{\text{pre}} \\ &= \left[\frac{|S|}{|S|+|N|}\bar{y}^S_{\text{post}} + \frac{|N|}{|S|+|N|}\bar{y}^N_{\text{post}}\right] \\ &\quad - \left[\frac{|S|}{|S|+|X|}\bar{y}^S_{\text{pre}} + \frac{|X|}{|S|+|X|}\bar{y}^X_{\text{pre}}\right] \end{align*}\]

\[ \hat{\beta}_{\text{FE}} = \frac{1}{|S|}\sum_{i \in S}(y_{i,\text{post}} - y_{i,\text{pre}}) \]

Interpretation

\(|S|\), \(|X|\), \(|N|\) = number of stayers, exiters, entrants.

  • \(\hat{\beta}_{\text{OLS}}\): all workers — total effect
  • \(\hat{\beta}_{\text{FE}}\): stayers only — incentive effect
  • Not bias — different quantities

Return to Q10

AKM: Matched Employer-Employee Framework

Abowd, Kramarz, and Margolis (1999, Econometrica)

The call centre exercise (Q12) builds on Abowd et al. (1999) and Fenizia (2022).

\[ \log(\text{wages})_{it} = \alpha_i + \psi_{J(i,t)} + x'_{it}\beta + e_{it} \]

  • \(\alpha_i\): worker FE (ability, human capital)
  • \(\psi_{J(i,t)}\): firm FE for the firm \(J\) employing worker \(i\) at time \(t\)
  • Identification requires worker mobility across firms (exogenous mobility)
    • Same logic as Q12: operators rotate across teams
    • Without mobility, \(\alpha_i\) and \(\psi_{J(i,t)}\) not separately identified

Return to Q12

Why Day FE and Post Are Collinear

A Concrete Example with Four Days

Treatment starts on day 3:

Day \(d_2\) \(d_3\) \(d_4\) \(\text{post}_t\)
1 0 0 0 0
2 1 0 0 0
3 0 1 0 1
4 0 0 1 1

\(\text{post}_t = d_3 + d_4\) — an exact linear combination of the day dummies. Stata would drop one variable automatically. The treatment effect \(\mu\) cannot be separated from the day effects.

Replacing Day FE with a Parametric Trend Restores Identification

  • Resolution: replace \(T-1\) day dummies with a single parametric trend (\(\gamma\;\text{time}_t\))
    • Fewer parameters → no collinearity → \(\mu\) is identified
  • Q8 application: all municipalities receive increased police force simultaneously — \(\text{post}_t\) is perfectly collinear with day FE
  • Q11 application: contingent pay switches on a single date for all workers — same collinearity
  • Trade-off: a linear trend is restrictive (assumes constant time effect) but estimable; day FE are flexible but break identification when treatment is universal

Return to Q8

References

References

Abowd, J. M., Kramarz, F., & Margolis, D. N. (1999). High Wage Workers and High Wage Firms. Econometrica, 67(2), 251–333. https://doi.org/10.1111/1468-0262.00020
Deaton, A. (2018). The Analysis of Household Surveys: A Microeconometric Approach to Development Policy (Reissue Edition with a New Preface). World Bank Group. https://doi.org/10.1596/ 978-1-4648-1331-3
Fenizia, A. (2022). Managers and Productivity in the Public Sector. Econometrica, 90(3), 1063–1084. https://doi.org/10.3982/ECTA19244
Marie, O., & Zölitz, U. (2017). High Achievers? Cannabis Access and Academic Performance. Review of Economic Studies, 84(3), 1210–1237. https://doi.org/10.1093/restud/rdx020
McKenzie, D. J. (2006). Disentangling Age, Cohort and Time Effects in the Additive Model. Oxford Bulletin of Economics and Statistics, 68(4), 473–495. https://doi.org/10.1111/j.1468-0084.2006.00173.x